44 research outputs found
Adversarially Robust Distillation
Knowledge distillation is effective for producing small, high-performance
neural networks for classification, but these small networks are vulnerable to
adversarial attacks. This paper studies how adversarial robustness transfers
from teacher to student during knowledge distillation. We find that a large
amount of robustness may be inherited by the student even when distilled on
only clean images. Second, we introduce Adversarially Robust Distillation (ARD)
for distilling robustness onto student networks. In addition to producing small
models with high test accuracy like conventional distillation, ARD also passes
the superior robustness of large networks onto the student. In our experiments,
we find that ARD student models decisively outperform adversarially trained
networks of identical architecture in terms of robust accuracy, surpassing
state-of-the-art methods on standard robustness benchmarks. Finally, we adapt
recent fast adversarial training methods to ARD for accelerated robust
distillation.Comment: Accepted to AAAI Conference on Artificial Intelligence, 202
Adversarial Robustness and Robust Meta-Learning for Neural Networks
Despite the overwhelming success of neural networks for pattern recognition, these models behave categorically different from humans. Adversarial examples, small perturbations which are often undetectable to the human eye, easily fool neural networks, demonstrating that neural networks lack the robustness of human classifiers. This thesis comprises a sequence of three parts. First, we motivate the study of defense against adversarial examples with a case study on algorithmic trading in which robustness may be critical for security reasons. Second, we develop methods for hardening neural networks against an adversary, especially in the low-data regime, where meta-learning methods achieve state-of-the-art results. Finally, we discuss several properties of the neural network models we use. These properties are of interest beyond robustness to adversarial examples, and they extend to the broad setting of deep learning
The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
No free lunch theorems for supervised learning state that no learner can
solve all problems or that all learners achieve exactly the same accuracy on
average over a uniform distribution on learning problems. Accordingly, these
theorems are often referenced in support of the notion that individual problems
require specially tailored inductive biases. While virtually all uniformly
sampled datasets have high complexity, real-world problems disproportionately
generate low-complexity data, and we argue that neural network models share
this same preference, formalized using Kolmogorov complexity. Notably, we show
that architectures designed for a particular domain, such as computer vision,
can compress datasets on a variety of seemingly unrelated domains. Our
experiments show that pre-trained and even randomly initialized language models
prefer to generate low-complexity sequences. Whereas no free lunch theorems
seemingly indicate that individual problems require specialized learners, we
explain how tasks that often require human intervention such as picking an
appropriately sized model when labeled data is scarce or plentiful can be
automated into a single learning algorithm. These observations justify the
trend in deep learning of unifying seemingly disparate problems with an
increasingly small set of machine learning models
Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
Deep neural networks are susceptible to shortcut learning, using simple
features to achieve low training loss without discovering essential semantic
structure. Contrary to prior belief, we show that generative models alone are
not sufficient to prevent shortcut learning, despite an incentive to recover a
more comprehensive representation of the data than discriminative approaches.
However, we observe that shortcuts are preferentially encoded with minimal
information, a fact that generative models can exploit to mitigate shortcut
learning. In particular, we propose Chroma-VAE, a two-pronged approach where a
VAE classifier is initially trained to isolate the shortcut in a small latent
subspace, allowing a secondary classifier to be trained on the complementary,
shortcut-free latent subspace. In addition to demonstrating the efficacy of
Chroma-VAE on benchmark and real-world shortcut learning tasks, our work
highlights the potential for manipulating the latent space of generative
classifiers to isolate or interpret specific correlations.Comment: Presented at the 36th Conference on Neural Information Processing
Systems (NeurIPS 2022
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
Cutting-edge diffusion models produce images with high quality and
customizability, enabling them to be used for commercial art and graphic design
purposes. But do diffusion models create unique works of art, or are they
replicating content directly from their training sets? In this work, we study
image retrieval frameworks that enable us to compare generated images with
training samples and detect when content has been replicated. Applying our
frameworks to diffusion models trained on multiple datasets including Oxford
flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training
set size impact rates of content replication. We also identify cases where
diffusion models, including the popular Stable Diffusion model, blatantly copy
from their training data.Comment: Updated draft with the following changes (1) Clarified the LAION
Aesthetics versions everywhere (2) Correction on which LAION Aesthetics
version SD - 1.4 is finetuned on and updated figure 12 based on this (3) A
section on possible causes of replicatio
Understanding and Mitigating Copying in Diffusion Models
Images generated by diffusion models like Stable Diffusion are increasingly
widespread. Recent works and even lawsuits have shown that these models are
prone to replicating their training data, unbeknownst to the user. In this
paper, we first analyze this memorization problem in text-to-image diffusion
models. While it is widely believed that duplicated images in the training set
are responsible for content replication at inference time, we observe that the
text conditioning of the model plays a similarly important role. In fact, we
see in our experiments that data replication often does not happen for
unconditional models, while it is common in the text-conditional case.
Motivated by our findings, we then propose several techniques for reducing data
replication at both training and inference time by randomizing and augmenting
image captions in the training set.Comment: 17 pages, preprint. Code is available at
https://github.com/somepago/DC